fix(go-api): sync document handler interface and enforce preview acce…#15688
fix(go-api): sync document handler interface and enforce preview acce…#15688Hz-186 wants to merge 2 commits into
Conversation
…ss check, with unit tests
📝 WalkthroughWalkthroughThis PR adds three document serving endpoints to the HTTP API. GetDocumentArtifact and GetDocumentPreview serve sandbox artifacts and document previews from object storage with validated filenames and content-types. DownloadDocument retrieves document binaries linked to specific datasets. All endpoints set appropriate Content-Type and Content-Disposition headers. ChangesDocument Serving Endpoints
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
internal/handler/document_test.go (2)
480-539: ⚡ Quick winAdd tests for Content-Disposition headers and parameter validation.
The test coverage for
GetDocumentArtifactis missing:
- Validation of the
Content-Dispositionheader. According to the PR objectives, there's attachment handling logic (shouldForceArtifactAttachment). The success test should verify that whenForceAttachmentis false, the header is set toinline; filename="safe.txt", and there should be a test case whereForceAttachmentis true to verify theattachmentdisposition.- Edge case tests for parameter validation (e.g., empty or missing
filenameparameter).📋 Suggested test additions
Add a test to verify the Content-Disposition header in the success case:
func TestGetDocumentArtifact_Success(t *testing.T) { gin.SetMode(gin.TestMode) h := &DocumentHandler{ documentService: &fakeDocumentService{}, } c, w := setupGinContextWithUser("GET", "/api/v1/documents/artifact/test.txt", "") c.Params = gin.Params{{Key: "filename", Value: "test.txt"}} h.GetDocumentArtifact(c) if w.Code != http.StatusOK { t.Fatalf("expected 200, got %d", w.Code) } if w.Header().Get("Content-Type") != "text/plain" { t.Fatalf("unexpected content type: %s", w.Header().Get("Content-Type")) } + if contentDisp := w.Header().Get("Content-Disposition"); contentDisp != "inline; filename=\"safe.txt\"" { + t.Fatalf("unexpected Content-Disposition: %s", contentDisp) + } if w.Body.String() != "artifact content" { t.Fatalf("unexpected body: %s", w.Body.String()) } }Add a test for the attachment disposition case by extending the fake service to return a response with
ForceAttachment: truefor a specific filename:func TestGetDocumentArtifact_Attachment(t *testing.T) { gin.SetMode(gin.TestMode) // Create a custom fake that returns ForceAttachment=true fake := &fakeDocumentService{} // Override the method for this specific test, or extend fake to handle "attachment.txt" h := &DocumentHandler{ documentService: fake, } c, w := setupGinContextWithUser("GET", "/api/v1/documents/artifact/attachment.txt", "") c.Params = gin.Params{{Key: "filename", Value: "attachment.txt"}} h.GetDocumentArtifact(c) if w.Code != http.StatusOK { t.Fatalf("expected 200, got %d", w.Code) } if contentDisp := w.Header().Get("Content-Disposition"); !strings.HasPrefix(contentDisp, "attachment;") { t.Fatalf("expected attachment disposition, got: %s", contentDisp) } }Add a test for empty filename:
func TestGetDocumentArtifact_EmptyFilename(t *testing.T) { gin.SetMode(gin.TestMode) h := &DocumentHandler{ documentService: &fakeDocumentService{}, } c, w := setupGinContextWithUser("GET", "/api/v1/documents/artifact/", "") c.Params = gin.Params{{Key: "filename", Value: ""}} h.GetDocumentArtifact(c) if w.Code != http.StatusOK { t.Fatalf("expected 200, got %d", w.Code) } var resp map[string]interface{} json.Unmarshal(w.Body.Bytes(), &resp) if resp["code"] == float64(common.CodeSuccess) { t.Fatal("expected error for empty filename") } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/handler/document_test.go` around lines 480 - 539, Update tests for GetDocumentArtifact on DocumentHandler to assert Content-Disposition behavior and validate filename params: extend or modify fakeDocumentService to return ForceAttachment=true for a specific filename (e.g., "attachment.txt") and add a TestGetDocumentArtifact_Attachment that calls DocumentHandler.GetDocumentArtifact and asserts w.Header().Get("Content-Disposition") starts with "attachment;"; also enhance the success test (TestGetDocumentArtifact_Success) to assert Content-Disposition equals `inline; filename="safe.txt"` when ForceAttachment is false (use the fake to return that behavior), and add a TestGetDocumentArtifact_EmptyFilename that calls GetDocumentArtifact with an empty gin Param filename and asserts the response is an error (non-success code) instead of CodeSuccess; reference DocumentHandler, GetDocumentArtifact, and fakeDocumentService to locate the code to change.
541-621: ⚡ Quick winAdd tests for Content-Disposition headers and dataset mismatch.
The test coverage is missing:
- Validation of
Content-Dispositionheaders for bothGetDocumentPreviewandDownloadDocument. The PR objectives mention these endpoints set appropriate headers.- For
DownloadDocument, a critical test case for dataset mismatch: when a document exists but belongs to a different dataset than the one specified in the URL. According to the PR objectives, the service "enforce[s] strict dataset alignment and access checks."- Parameter validation edge cases (empty
docID, emptydatasetID).📋 Suggested test additions
Add Content-Disposition validation to the success tests:
func TestGetDocumentPreview_Success(t *testing.T) { // ... existing setup ... h.GetDocumentPreview(c) if w.Code != http.StatusOK { t.Fatalf("expected 200, got %d", w.Code) } if w.Header().Get("Content-Type") != "text/plain" { t.Fatalf("unexpected content type: %s", w.Header().Get("Content-Type")) } + if contentDisp := w.Header().Get("Content-Disposition"); contentDisp != "inline; filename=\"preview.txt\"" { + t.Fatalf("unexpected Content-Disposition: %s", contentDisp) + } if w.Body.String() != "preview content" { t.Fatalf("unexpected body: %s", w.Body.String()) } }Add a dataset mismatch test for DownloadDocument:
func TestDownloadDocument_DatasetMismatch(t *testing.T) { gin.SetMode(gin.TestMode) // Extend fakeDocumentService to return an error for dataset mismatch // or use a specific combination that triggers this error fake := &fakeDocumentService{} h := &DocumentHandler{ documentService: fake, } c, w := setupGinContextWithUser("GET", "/api/v1/datasets/wrong-dataset/documents/doc-1", "") c.Params = gin.Params{{Key: "dataset_id", Value: "wrong-dataset"}, {Key: "document_id", Value: "doc-1"}} h.DownloadDocument(c) if w.Code != http.StatusOK { t.Fatalf("expected 200, got %d", w.Code) } var resp map[string]interface{} json.Unmarshal(w.Body.Bytes(), &resp) // Verify appropriate error code for dataset mismatch if resp["code"] == float64(common.CodeSuccess) { t.Fatal("expected error for dataset mismatch") } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/handler/document_test.go` around lines 541 - 621, Add assertions for Content-Disposition headers in TestGetDocumentPreview_Success (verify the header includes an appropriate filename and disposition like "inline; filename=...") and in TestDownloadDocument_Success (verify "attachment; filename=..."), extend fakeDocumentService or add a specific behavior in the fake used by DownloadDocument tests so that requesting doc-1 with a different dataset_id returns a dataset-mismatch error and write a new TestDownloadDocument_DatasetMismatch that calls DownloadDocument with dataset_id "wrong-dataset" and asserts the response error code is not common.CodeSuccess, and finally add lightweight parameter validation tests calling GetDocumentPreview and DownloadDocument with empty docID and/or datasetID (use setupGinContextWithUser and setting c.Params to empty "") expecting the handler to return the appropriate error code; reference GetDocumentPreview, DownloadDocument, fakeDocumentService, setupGinContextWithUser in your changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/handler/document.go`:
- Around line 480-482: The Content-Disposition header is using res.FileName
directly, risking header injection; update the download handler to sanitize the
filename before use by calling the same sanitizer used elsewhere
(sanitizeArtifactFilename) or by using the already-sanitized
artifact.SafeFilename pattern like in GetDocumentArtifact; replace
fmt.Sprintf(`attachment; filename="%s"`, res.FileName) with a safe value (e.g.,
safeName := sanitizeArtifactFilename(res.FileName) and use safeName) so
quotes/newlines are removed/escaped before setting the header and keep
c.Data(...) unchanged.
---
Nitpick comments:
In `@internal/handler/document_test.go`:
- Around line 480-539: Update tests for GetDocumentArtifact on DocumentHandler
to assert Content-Disposition behavior and validate filename params: extend or
modify fakeDocumentService to return ForceAttachment=true for a specific
filename (e.g., "attachment.txt") and add a TestGetDocumentArtifact_Attachment
that calls DocumentHandler.GetDocumentArtifact and asserts
w.Header().Get("Content-Disposition") starts with "attachment;"; also enhance
the success test (TestGetDocumentArtifact_Success) to assert Content-Disposition
equals `inline; filename="safe.txt"` when ForceAttachment is false (use the fake
to return that behavior), and add a TestGetDocumentArtifact_EmptyFilename that
calls GetDocumentArtifact with an empty gin Param filename and asserts the
response is an error (non-success code) instead of CodeSuccess; reference
DocumentHandler, GetDocumentArtifact, and fakeDocumentService to locate the code
to change.
- Around line 541-621: Add assertions for Content-Disposition headers in
TestGetDocumentPreview_Success (verify the header includes an appropriate
filename and disposition like "inline; filename=...") and in
TestDownloadDocument_Success (verify "attachment; filename=..."), extend
fakeDocumentService or add a specific behavior in the fake used by
DownloadDocument tests so that requesting doc-1 with a different dataset_id
returns a dataset-mismatch error and write a new
TestDownloadDocument_DatasetMismatch that calls DownloadDocument with dataset_id
"wrong-dataset" and asserts the response error code is not common.CodeSuccess,
and finally add lightweight parameter validation tests calling
GetDocumentPreview and DownloadDocument with empty docID and/or datasetID (use
setupGinContextWithUser and setting c.Params to empty "") expecting the handler
to return the appropriate error code; reference GetDocumentPreview,
DownloadDocument, fakeDocumentService, setupGinContextWithUser in your changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dcee25d0-5931-43f3-9e66-7334ed7d9b90
📒 Files selected for processing (5)
internal/handler/document.gointernal/handler/document_test.gointernal/router/router.gointernal/service/document.gointernal/service/document_test.go
| c.Header("Content-Type", res.ContentType) | ||
| c.Header("Content-Disposition", fmt.Sprintf(`attachment; filename="%s"`, res.FileName)) | ||
| c.Data(http.StatusOK, res.ContentType, res.Data) |
There was a problem hiding this comment.
Sanitize filename in Content-Disposition header to prevent header injection.
res.FileName originates from doc.Name and is used directly in the header without sanitization. If the document name contains characters like ", \r, or \n, this could result in malformed headers or header injection.
The GetDocumentArtifact handler correctly uses artifact.SafeFilename (which is sanitized via sanitizeArtifactFilename). Apply similar sanitization here for consistency and safety.
Proposed fix
+func sanitizeDownloadFilename(filename string) string {
+ // Remove or replace characters that could cause issues in Content-Disposition headers
+ replacer := strings.NewReplacer(`"`, `_`, "\r", "", "\n", "", "\\", "_")
+ return replacer.Replace(filename)
+}
+
func (h *DocumentHandler) DownloadDocument(c *gin.Context) {
// ... existing code ...
c.Header("Content-Type", res.ContentType)
- c.Header("Content-Disposition", fmt.Sprintf(`attachment; filename="%s"`, res.FileName))
+ c.Header("Content-Disposition", fmt.Sprintf(`attachment; filename="%s"`, sanitizeDownloadFilename(res.FileName)))
c.Data(http.StatusOK, res.ContentType, res.Data)
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/handler/document.go` around lines 480 - 482, The Content-Disposition
header is using res.FileName directly, risking header injection; update the
download handler to sanitize the filename before use by calling the same
sanitizer used elsewhere (sanitizeArtifactFilename) or by using the
already-sanitized artifact.SafeFilename pattern like in GetDocumentArtifact;
replace fmt.Sprintf(`attachment; filename="%s"`, res.FileName) with a safe value
(e.g., safeName := sanitizeArtifactFilename(res.FileName) and use safeName) so
quotes/newlines are removed/escaped before setting the header and keep
c.Data(...) unchanged.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #15688 +/- ##
=======================================
Coverage 93.16% 93.16%
=======================================
Files 10 10
Lines 717 717
Branches 118 118
=======================================
Hits 668 668
Misses 29 29
Partials 20 20 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Description
This PR syncs the
documentServiceIfaceinterface and introduces handler methods for document preview, artifact fetching, and downloading in the Go API. It also ensures that strict dataset alignment and access checks are enforced when retrieving or downloading documents.Furthermore, this PR introduces comprehensive unit tests for both the newly added Handler and Service methods to ensure robustness and prevent future regressions.
Key Changes
internal/router/router.go.documentServiceIfacewithGetDocumentArtifact,GetDocumentPreview, andDownloadDocument.internal/handler/document.go.internal/service/document.goto strictly check if a document belongs to the requested dataset before allowing downloads or previews.sanitizeArtifactFilename) and attachment handling (shouldForceArtifactAttachment).internal/handler/document_test.go): Added mock service implementations and Gin router tests covering success, not-found, and internal error states for all 3 new endpoints.internal/service/document_test.go): Added extensive business logic tests including dataset mismatch checks, non-existent document checks, and artifact file validation.Testing
All unit tests have been executed locally and passed successfully:
TestGetDocumentArtifact_*(Success, NotFound, UnexpectedError, InvalidFilename, InvalidFileType)TestGetDocumentPreview_*(Success, NotFound)TestDownloadDocument_*(Success, MissingDocID, WrongDataset)TestArtifactHelpers(Sanitize logic & content type assertions)